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Abstract. The convergence rate of stochastic gradient search is analyzed in this paper. Using 
arguments based on differential geometry and Lojasiewicz inequalities, tight bounds on the conver- 
gence rate of general stochastic gradient algorithms are derived. As opposed to the existing results, 
the results presented in this paper allow the objective function to have multiple, non-isolated minima, 
impose no restriction on the values of the Hessian (of the objective function) and do not require the 
algorithm estimates to have a single limit point. Applying these new results, the convergence rate 
of recursive prediction error identification algorithms is studied. The convergence rate of supervised 
and temporal-difference learning algorithms is also analyzed using the results derived in the paper. 
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1. Introduction. Stochastic gradient algorithms are a recursive optimization 
method of the stochastic approximation type. This method is commonly used to 
compute minima (or maxima) of a function whose values are available only through 
noise-corrupted observations. It has found a wide range of applications in the areas 
such as automatic control, system identification, signal processing, machine learning, 
operations research, statistical inference, economics and management (to name a few). 
For further details, see [S], [IS], [IS], El, US], EZ], EH] and the references cited 
therein. 

Due to their practical importance, the asymptotic behavior of stochastic gradi- 
ent algorithms has been thoroughly studied in a large number of papers and books. 
A significant attention has been given to the rate of convergence, as this property 
directly characterizes the efficiency and enables a construction of reliable stopping 
rules (see E], [19 , |lH], ES], ES] and the references given therein). Although the 
existing results on the convergence rate provide a good insight into the efficiency and 
asymptotic behavior of stochastic gradient algorithms, they hold under very restric- 
tive conditions. More specifically, the existing results require the algorithm estimates 
to converge to an isolated minimum of the objective function at which the Hessian 
(of the objective function) is strictly positive definite. Unfortunately, such conditions 
are practically impossible to verify for complex, high-dimensional and high-nonlinear 
stochastic gradient algorithms. 

In this paper, the rate of convergence of stochastic gradient algorithms is ana- 
lyzed for the case when the objective function has multiple, non-isolated minima (note 
that the Hessian can be only semi-definite at a non-isolated minimum) and when the 
algorithm estimates do not necessarily converge to a single limit point. Using ar- 
guments based on differential geometry and Lojasiewicz inequalities, relatively tight 
upper bounds on the convergence rate are derived. The obtained results cover a broad 
class of complex stochastic gradient algorithms. We show how they can be used to 
evaluate the convergence rate of recursive prediction error algorithms for identifica- 
tion of linear stochastic dynamical systems. We also show how the convergence rate 
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of supervised and temporal-difference learning algorithms can be assessed using the 
results derived in the paper. 

The paper is organized as follows. The main results are presented in Section [51 
where stochastic gradient algorithms with additive noise are considered. In Section 
[3l the convergence rare of stochastic gradient algorithms with Markovian dynamics 
is analyzed. Sections [4] and [6] are devoted to examples of the results presented in 
Sections [2] and [3l In Section IH supervised learning algorithms for feedforward neural 
networks and their convergence rate are studied, while the rate of convergence of 
temporal-difference learning algorithms is considered in Section [S] The convergence 
rate of recursive prediction error algorithms for the identification of linear stochastic 
systems is analyzed in Section [51 Sections [7] ~ [TT] contain the proofs of the results 
presented in Sections [H - [H 

2. Main Results. In this section, the rate of convergence of the following algo- 
rithm is analyzed: 

{Vf{9,,) + Wn), n>0. (2.1) 

In this recursion, / : M'^" ^ R is a differentiable function, while {a„}„>o is a sequence 
of positive real numbers, while is an R'*" -valued random variable defined on a 
probability space {fl, !F, P), while {w„}n>o is an R''''-valued stochastic process defined 
on the same probability space. To allow more generality, we assume that for each 
n > 0, Wn is a random function of • • ■ j ^n- In the area of stochastic optimization, 
recursion (|2.ip is known as a stochastic gradient algorithm (or stochastic gradient 
search), while function /(•) is referred to as an objective function. For further details 
see [21], [IH] and references given therein. 

Throughout the paper, unless otherwise stated, the following notation is used. 
The Euclidean norm is denoted by || • ||, while d{-,-) stands for the distance induced 
by the Euclidean norm. S and C are the sets of stationary and critical points of /(•), 
i.e., 

5 = {6* e R'*" : V/(6i) = 0}, C {f{e) -.deS}. 
Sequence {7ri}n>o is defined by 70 = and 

n-l 

i=0 

for n > 1. For t £ (0, 00) and n > 0, a{n, t) is an integer defined as 

a(n, t) ~ max {fc > n ; 7^, — 7„ < t} . 

Algorithm (|2.1|) is analyzed under the following assumptions: 
Assumption 2.1. lim„_^oo an = and J2'^=q = 00. 
Assumption 2.2. There exists a real number r e (0, 00) such that 

w = lim sup max 

n^cx) n<k<a(n,l) 

w.p.l on {sup„>o ||6'„|| < 00}. 

Assumption 2.3. For any compact set Q C R"^" and any a e f{Q), there exist 
real numbers Sq^a G (0,1), /^Q,a G (lj2], Mq^a G [l,c») such that 

|/(0)-a| <MQ,a||V/(0)r«- (2.2) 



< 00 
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for all 6 E Q satisfying \ f{9) — a\ < Sq a- 

Assumption 2.4. For any compact set Q C M.'^" , there exist real numbers vq £ 
(0,1], Nq G [IjCxd) such that 



for all 9 e Q. 

Remark. In order to show that Assumvtion \2.S\ holds, it is sufficient to demon- 
strate its 'local version, ' i.e., that there exists an open vicinity U of S with the follow- 
ing property: For any compact set Q C U and any a G f{Q), there exit real numbers 
^Q.a £ (0, 1], /iQ.a G (1)2], Mg^a £ [Ij oo) such that H2. 2\) holds for all € Q satisfy- 
ing \f{0) — a| < Sq^a (for details see the appendix at the end of the paper). Similar 
conclusions apply to Assumption \2.4\ 

Assumption 12. ll correspond to the sequence {an}n>o and is widely used in the 
asymptotic analysis of stochastic gradient and stochastic approximation algorithms. 
Assumption 12 . 21 is a noise condition. In this or a similar form, it is involved in most of 
the results on the convergence rate of stochastic gradient search and stochastic approx- 
imation. It holds for algorithms with Markovian dynamics (see the next section). It is 
also satisfied when when {w„}„>o is a a martingale-difference sequence. Assumptions 
12.31 and 1 2.41 are related to the stability of the gradient flow dO/dt = —Vf{9), or more 
specifically, to the geometry of the set of stationary points S. In the area of differential 
geometry, relations ()2.2p and ()2.3p are known as the Lojasiewicz inequalities (see [20] 
and [H] for details). They hold if /(•) is analytic or subanalytic in an open vicinity 
of S (see [B], [^T] for the proof; for the form of Lojasiewicz inequality appeared in As- 
sumption [2]3] see [15l Theorem LI, p. 775]; for the definition and properties of analytic 
and subanalytic functions, consult [14j). Although analyticity and subanalyticity 
are fairly strong conditions, they hold for the objective functions of many stochastic 
gradient algorithms commonly used in the areas of system identification, signal pro- 
cessing, machine learning, operations research and statistical inference. E.g., in this 
paper, we show that the objective functions associated with supervised and temporal- 
difference learning are analytical (Sections [4] and [5|). We also demonstrate the same 
property for recursive prediction error identification (Section[6|). Furthermore, in [31j . 
we show analyticity for the objective functions associated with recursive identification 
methods for hidden Markov models. It is also worth mentioning that the objective 
functions associated with recursive algorithms for principal and independent compo- 
nent analysis (as well as with many other adaptive signal processing algorithms) are 
usually polynomial or rational, and hence, analytic, too (see e.g., [TU] and references 
cited therein). 

In order to state the main results of this section, we need further notation. For a 
compact set Q C K'^", Cq G [1, oo) stands for an upper bound of ||V/(-)|| on Q and 
for a Lipschitz constant of V/(-) on the same set. A denotes the set of accumulation 
points of {0n}n>o (notice that A is a random set), while 



d{e,s)<NQ\\vf{0)\r 



(2.3) 



/ = liminf /((?„). 



Q is a random set defined as 





otherwise 
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where p is an arbitrary positive (deterministic or random) quantity. (5, /i, i^, C, M 
and N are random quantities defined by 

6^Sqj, fi^f^Qj, i> = MqJ /2, C^Cq, = ^ = ^0 (2-4) 

when sup„>o ||6'ri|| < oo and by 

(5=1, /i = 2, i>=l, (7 = 1, M = l, 7V = 1 (2.5) 

otherwise (symbol "is used to emphasize the dependence on / and Q). Moreover, let 



1/(2 -A), if/i<2 
oo, if /i = 2 



p = /t min{r, f}, g = t> min{r, f}. (2.6) 



Furthermore, let 




Remark. Since f e f{Q) when sup„>g \\9,i\\ < oo, it is obvious that random 
quantities S, (1, 0, p, q, r, C , M , N are well-defined. Moreover, it is easy to conclude 
that inequalities Q<5<l,l<(i<2,p> min{l, r}, g > 1, f > 1, 1 < C, M, N < oo 
hold everywhere (i.e., on entire OJ. It can also be demonstrated that (Lojasiewicz coef- 
ficients) Sq^a, fJ-Q.a, VQ, Mg^a, Nq havB 'measurable versions' such that 5, jl, v, p, q, 
f, M, N are random variables in probability space (Cl,J-,P) (i.e., measurable with re- 
spect to T ; details are provided in the appendix at the end of the paper). Furthermore, 
as a consequence of Assumption [273[ we have 



\fiO)-f\<M\\Vfi9)\f (2.7) 

071 {sup„>Q \\9n\\ < oo} for all 9 G Q satisfying \f{9) — f \ < S. 

Our main results on the convergence and convergence rate of the recursion (|2.ip 
are contained in the next two theorems. 

Theorem 2.1. Let Assumptions {2J\ - \2.3\ hold. Then, lim„^ooV/(0„) = and 
lim„_oo f{9n) = / w.p.l on {sup„>o ||6'„|| < oo}. 

Theorem 2.2. Let Assumptions [K7\ - 12.31 hold. Then, there exists a random 
quantity K (which is a deterministic function of C,M) such that 1 < K < oo every- 
where and such that 

limsup7^||V/(0„)ll' < K{(l>{w)Y, (2.8) 
limsup7jJ|/(0„) - /I < K{4>{w)Y (2.9) 

n — ^oo 

W.p.l on {sup„>g ||6'„|| < c»}. If additionally, Assumption \2.4\ is satisfied, then, there 
exists another random quantity L (which is a deterministic function of C, M,N) such 
that I < L < oo everywhere and such that 

limsup7«d(6l„, S) < L{(t)(w)y (2.10) 
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w.p.l on {sup„>Q ||6'„|| < oo}. 

The proofs are provided in Section [71 As an immediate consequence of the previ- 
ous theorems, we get the following corollaries: 

Corollary 2.3. Let Assumvtions \2. 1\ - \2l4\ hold. Then, the following is true: 

l|v/(0„)f = 0(7-^), d(/(0„), = 0(7-^), d(en,s)^o{j-J) 

w.p.l on {sup„>Q ||6'„|| < cxo} n {w = 0, f > r}, and 

||V/(0„)f = 0(7,;^), d(/(0„),C)=O(7,;^), d(0„,5)=O(7-«) 

w.p.l on {sup„>g ||6'„|| < cxo} n {w = 0, f > r^. 

Corollary 2.4. Let Assumvtions [2J\ - \2.3\ hold. Then, 

l|V/(0„)f = 0(7-^), d(f{9r.), C) = o{^-n 

w.p.l on {sup„>Q \\9n\\ < 00}, where p = min{l,r}. 

In the literature on stochastic and deterministic optimization, the asymptotic 
behavior of gradient search is usually characterized by the gradient, objective and 
estimate convergence, i.e., by the convergence of sequences {V/(^^„)}„>o, {f{dn)}n>o 
and {9n}n>o (see e.g., [3], [5], [5S], [55] are references quoted therein). Similarly, the 
convergence rate can be described by the rates at which {V/(^?„)}„>o, {f{dn)}n>o 
and {0„}n>o tend to the sets of their limit points. Theorem 12.21 and CoroUarv 12.31 
provide relatively tight upper bounds on these rates in the terms of the asymptotic 
properties of noise {wTi}ri>o and the gradient flow dO/dt ~ —'Vf{d). Basically, the 
theorem and its corollary claim that the convergence rate of {||V/(0„)|p}„>o and 
{/(0„)}„>o is the slower of the rates 0{'-f~^'^) (the rate of the gradient flow d9/dt = 
—yf{9) sampled at instants {7n}„>o) and 0(7"'"'^) (the rate of the noise averages 
I Yl,i=n ctiWiW'^)- Apparently, the rates provided in Theorem[17T]and Corollary 
12.31 are of a local nature: They hold only on the event where algorithm (|2.ip is stable 
(i.e., where sequence {9n}n>o is bounded). Stating results on the convergence rate 
in such a local form is quite reasonable due to the following reasons. The stability 
of stochastic gradient search is based on well-understood arguments which are rather 
different from the arguments used in the analysis of the convergence rate. Moreover 
and more importantly, it is straightforward to get a global version of the rates provided 
in Theorem 12.11 and Corollary 12.31 by combining the theorem with the methods used 
to verify or ensure the stability (e.g., with the results of '1\ and [9|). 

Due to its practical and theoretical importance, the rate of convergence of stochas- 
tic gradient search (and stochastic approximation) has been the subject of a large 
number of papers and books (see see [5], [IH], [IS]) [21] and references cited 

therein). Although the existing results provide a good insight into the asymptotic 
behavior and efficiency of stochastic gradient algorithms, they are based on fairly re- 
strictive assumptions: Literally, they all require the objective function /(•) to have 
an isolated minimum 9 (sometimes even to be strongly unimodal) such that Hessian 
V'^f{9) is strictly positive definite and lim„_+oo = w.p.l. Unfortunately, in the 
case of high-dimensional and high-nonlinear stochastic gradient algorithms (such as 
online machine learning and recursive identification) , it is hard (if not impossible at 
all) to show even the existence of an isolated minimum, let alone the definiteness of 
V^/(-) and the point-convergence of {0n}n>o- Relying on the Lojasiewicz inequali- 
ties. Theorem [211] and Corollarv l2 . 31 overcome these difficulties: Both the theorem and 
its corollary allow the objective function /(•) to have multiple, non-isolated minima, 
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impose no restriction on the values of V^/(-) (notice that V^/(-) cannot be strictly 
definite at a non- isolated minimum or maximum) and permit {0„}„>o to have mul- 
tiple limit points. Moreover, they cover a broad class of complex stochastic gradient 
algorithms (see Sections |4] and [U see also [31]). To the best or our knowledge, these 
are the only results on the convergence rate with such features. 

Regarding the results of Theorem 12.11 and Corollary 12.31 it is worth mentioning 
that they are not just a combination of the Lojasiewicz inequalities and the existing 
techniques for the asymptotic analysis of stochastic gradient search and stochastic 
approximation. On the contrary, the existing techniques seem to be inapplicable to 
the case of multiple non-isolated minima. The reason comes out of the fact that these 
techniques crucially rely on the Lyapunov function u{9) = (6* — 6)^V^ f(9){9 — 6), 
where is an isolated minimum such that lim„^oo 0n ^ Q w.p.l and V^/(-) is strictly 
positive definite. Unfortunately, in the case of multiple, non-isolated minima, neither 
does {On]n>o necessarily have a single limit point (limit cycles can occur), nor V^/(-) 
can be a strictly positive definite matrix. In order to overcome this problem, we use 
a 'singular' Lyapunov function v{9) = \/{f{9) — /)^/^, where p G (0,/i/(2 — jl)] and 
6* e {i? G W^" : f^d) > /}. Although subtle techniques are needed to handle such a 
Lyapunov function (see Section [7]), provides intuitively clear explanation of the 
results of Theorem 12.21 and Corollary 12.31 The explanation is based on the heuristic 
analysis of the following two cases. 

Case 1: sup„>Q ||6'„|| < oo and liminf„^oo 7n''(/(6'n) - f) = 
In this case, there exists an increasing integer sequence {nk}k>o such that f{On^) < f 
for each fc > and lim„^oo Intifi^nk)^ f) = —co- Therefore, Assumption 12 .31 implies 



iim„^o,7;; j|v/(0„ 



cxD. Since max^; 



o{r„ 



there exists a large integer m 3> 1 such that f{0„i) < f and max„>„i 
II ^/(Sm) 11/2. Then, for n > a{m, 1), Taylor formula yields 



see Lemma [7T|) . 

IIEr=m"i^"*ll < 



fi0n) -fiOm) - iyfi0m)f ^ a,(V/(0,) + W,) 

i—m 

n-1 

^f{Om) - ||V/(0™)f (7„ - 7m) - (V/(0™))^ 



<fiO. 
<f{0. 



n-1 



(notice that 7„ — jm > 1)- Hence, f{On) < f{9„i) < / for n > a(m, 1), which is 
impossible as lim„^oo /(^n) = /• 

Case 2: sup„>o ||6'„|| < oo and Yivasnj)^^^ ln^{f{0„) - /) = oo. 
Similarly as in the previous case, there exists an increasing integer sequence {nk\k>o 
such that f{9nk) > f for each fc > and hm„^oo ln^{f{(^nk)^f) = oo- Consequently, 
Assumption 12.31 yields liuik^^^l,, ||V/(0„,,)|| = oo and 



l|V/(en 



.7;jiv/(0„j 

l|2 

> 



1 



if {On,) - fY+^/p ~ MV^fie^,) - /)i+i/p-2/A 

: / and maxfc>y 



for fc > 0. Since 1 + 1/p > 2/fi, lim„^oo /(6': 
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ELn 



0(7„'^), there exists a large integer m ^ 1 such that max„>,„ ||^"^^ aiWij] < 
\\Vf{e„,)\\/2, f{9,n)> f and 

\\^f{Om)r > 1 



Then, for any n > a{in, 1) satisfying > /, Tayfor formula implies 



n-l 



i—m 



^I'lPmjH ^ (7n— 7m)H ^ > aiWi 



n-l 



i—m 



1 

>— ^-77T(7n - 7m)- 

Thus, /(6'„) - / < (2pAf)2p(7„ - 7,„)-P for n > a(m, 1) (notice that /i > 1). 

Following the reasoning outlined in the above cases, it can easily be concluded 
that the slower of 0{'j^p) and 0(7,7'''^) is the rate at which /(6'„) tends to /. Since 
p can be any number from (0, ffi] (in the proof of Theorem 12.11 Section [7l value 
p = p — jj.mm{r,f} is used), it is also straightforward to deduce that 0(7^^) is the 
convergence rate of {/(0n)}n>o- In addition to this, the previously described heuristics 
indicate that in the terms of r and jl, 0{'~f~P) is probably the tightest estimate of the 
convergence rate of {f{On)}n>o- The same conclusion is suggested by the following 
two special cases: 

Case (a): Wn ~ /or each n > 0. 
Due to Assumption 12. 3i we have 



dt 



iiv/(0(i))f <-(i/A/) '{jm))~!fi'^ 



for a solution Q{-) of dQjdt = -V/(0) satisfying Q{t) G Q for aU i e [0, c») and 
limt^oo /(6'(t)) = /. Consequently, ]{B(t)) - / = ©(t^'^/^^-/')) = ©(t"^'')- As 
{^n}n>o is asymptotically equivalent to 0(-) sampled at time instances {7n}n>o, we 
get /(6'„) ~ f — 0(7,7''^). The same result is implied by Theorem 12.11 and Corollary 
231 

Case (b): f{9) = 9'^ A9 and A is a strictly positive definite matrix. 
Recursion (|2.ip reduces to a linear stochastic approximation algorithm in this case. 
For such an algorithm, it is known that the tightest estimate of the convergence rate 
is fi9n) = 0(7,7^0 if w > 0, and /(6l„) 0(772'-) for w = (see \2Q\). The same rate 
is provided by Theorem 12.21 and Corollarv l2.3l 



3. Stochastic Gradient Algorithms with Markovian Dynamics. In or- 
der to illustrate the results of Section [2] and to set up a framework for the analysis 
carried out in Sections [3] and [HI we apply Theorems 12.11 12.21 and Corollaries 12.31 12.41 
to stochastic gradient algorithms with Markovian dynamics. These algorithms are 
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defined by the following difference equation: 

0„+i =0„-a„F((?„,C„+i), n>0. (3.1) 

In this recursion, F : M.'^" x R'^s M.'^" is a Borel-measurable function, while {q!„}„>o 
is a sequence of positive real numbers. 9q is an R*^" -valued random variable defined 
on a probability space (ri,J?^, P), while {^„}„>o is an R'^s-valued stochastic process 
defined on the same probability space. {Cn}n>o is a Markov process controlled by 
{(^n}n>o, i-e., there exists a family of transition probability kernels {ne(-, •)}eeR<*8 
defined on R''« such that 

) = ne„(Cn,S) 

w.p.l for any Borel-measurable set B C R'^e and n > 0. In the context of stochastic 
gradient search, F{9m£,n+i) is regarded to as an estimator of V f{9n)- 

The algorithm (13. 1|) is analyzed under the following assumptions. 

Assumption 3.1. hm„^oo Q!„ = 0, limsup„^oc |a~^i— a~^| < oo andY^^=o ctn = 
oo. There exists a real number r e (0, oo) such that X]^o '^nln' 

Assumption 3.2. There exist a differentiable function f : R'^" R and a 
Borel-measurable function F : R''" x R*^? — > R'*" such that V/(-) is locally Lipschitz 
continuous and such that 

F{9,o-^f{0)^F{e,o~{nF){e,o 

for each 9 e R'^" , £, e R'^^ , where {UF){9,0 ^ J F{9,C)ne{£„d£,'). 

Assumption 3.3. For any compact set Q C R''" and s S (0, 1), there exists a 
Borel-measurable function ipg.s ■ R'^^ — > [IjOo) such that 

max{||F(0,oii, iimoii, mFmm} < ^qAO, 
mFW,o - imio",m < ^qAop' - ^"r 

for all 9, 9', 9" G Q, ^ e K'''- 

Assumption 3.4. Given a compact set Q C R*^" and s G (0, 1), 

snp E {<f'^Q {£,n)I{TQ>n}\Oo = 6',Co = < oo 

n>0 

for all 6* e R^^ C e R'^S where tq = inf{n > : £»„ Q}. 

The main results on the convergence rate of recursion ()3.1|1 arc in the next theo- 
rem. 

Theorem 3.1. Let Assumvtions lS. 1\ hold, and suppose that f {■) (introduced 
in Assumvtion \3.2\) satisfies Assumptions [K3\ and \K4\ Then, 

l|V/(e„)f = 0(7,7^), d{f{9n),C) = o{j-n 

w.p.l on {sup„>Q ll^nll < oo}. Moreover, the following is true: 

l|V/(0„)f = o(7„-P), d(/(0„),C) = o(7-P), d(0„,5) = o(7-«) 

w.p.l on {sup„>Q ll^nll < oo} n {f > r}, and 

\NfiOn)r = 0{j-P), d{f{9n),C)^0{j-P), d(0„,5) = O(7-^) 

8 



w.p.l on {sup„>o ||6'„|| < 00} n {f < r}. 

The proof is provided in Section [8l C, S, p, p, q and f are defined in Section [2l 
Assumption 13.11 is related to the sequence {an}n>o- It holds if a„ = for 
n > 1, where a e (1/2, 1] is a constant. On the other side, Assumptions 13 . 21 - 13 .41 cor- 
respond to the stochastic process {Cn}n>o and are quite standard for the asymptotic 
analysis of stochastic approximation algorithms with Markovian dynamics. Assump- 
tions 13.21 - 13.41 have been introduced by Metivier and Priouret in [52] (see also (U 
Part II] ) , and later generalized by Kushner and his co-workers (see [TB] and references 
cited therein). However, neither the results of Metivier and Priouret, nor the results 
of Kushner and his co-workers provide any information on the convergence rate of 
stochastic gradient search in the case of multiple, non-isolated minima. 

Regarding Theorem l3.H the following note is also in order. As already mentioned 
in the beginning of the section, the purpose of the theorem is illustrating the results 
of Theorem 12.11 and providing a framework for studying the examples presented in 
the next sections. Since these examples perfectly fit into the framework developed by 
Metivier and Priouret, more general assumptions and settings of |16j are not consid- 
ered here in order just to keep the exposition as concise as possible. 

4. Example 1: Supervised Learning. In this section, online algorithms for 
supervised learning in feedforward neural networks are analyzed using the results of 
Theorems O and O 

To state the problem of supervised learning and to define the corresponding algo- 
rithms, we need the following notation. A'^i and are positive integers, while dg = 
Ni{N2 + 1). 01, 02 : R ^ K are differentiable functions, while ipi, . . . , ^JVa : K''" ^ M 
are Borel- measurable functions. For a'l, . . . , a'j^_^ G M, a" , . . . , S., x ^ ]R''=" , 

let 

/ Ni / N2 

Ge{x) = 4>i a'.^h iYl °■^u^2^^2{x) 

\il=l \i2=l 

where 6 =[a'^ - ■ ■ a'^^ a'( i ■ ■ ■ a'^^ n^'^ ■ Moreover, 7r(-, •) denotes a probability measure 
on R"^- X M, while 

f((^) = \ j{y-Ge{x)f^{dx,dy) 

for 6* e M'^" . Then, the mean-square error based supervised learning in feedforward 
neural networks can be described as the minimization of /(•) in a situation when only 
samples from 7r(-,-) are available. In this context, Gg{-) represents the input-output 
function (i.e., the architecture) of the feedforward neural network to be trained. 4>i{-) 
and 02(') are the network activation functions, while 9 is the vector of the network 
parameters to be tuned through the process of supervised learning. For more details 
on neural networks and supervised learning, see e.g., [11], [12] and references cited 
therein. 

Function /(•) is usually minimized by the following stochastic gradient algorithm: 
0n+i^0n + an{yn-Ge„{xn))He^{xn), n>Q. (4.1) 

In this recursion, {q!„}„>o is a sequence of positive real numbers, while Hg{-) = 
'^eGe{-)- Oq is an R''''-valued random variable defined on a probability space {il, T, P), 
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while {{xn, yn)}n>o IS an R'^" x M- valued stochastic process defined on the same prob- 
ability space. In the context of supervised learning, {a;„,y„}„>o is regarded to as a 
training sequence. 

The asymptotic behavior of algorithm (|4.ip is analyzed under the following as- 
sumptions: 

Assumption 4.1. (/'i(-) and (/)2(-) are real- analytic. Moreover, and 02(') 

have (complex-valued) continuations 4>i(-) and (f)2{-) (respectively) with the following 
properties: 

(i) (f>i{z) and 4'2{z) map z G C into C denotes the set of complex numbers). 

(ii) 4'i{x) = 4>i{x) and 4>2{x) = 4'2{x) for all a; G R. 

(iii) There exist real numbers e G (0, 1), K G [1, oo) such that (f>i{-) and 4'2{') are 
analytic on = {2 G C : d(2,R) < e}, and such that 

\^i{z)\<K{l + \z\), 
Tn&^{\i[{z)\M2{z)\M'2{z)\} < K 

for all 2 G Ve ((j)i{-), 4>2{') are the derivatives of (j)i(-), (t>2i'))- 

Assumption 4.2. {{xn,yn)}n>o eire i.i.d. random variables distributed accord- 
ing the probability measure 7r(-,-). There exists a real number L G [l,oo) such that 
maxi<fc<Af2 |'0fc(a;o)| < L and \yo\ < L w.p.l. 

Our main results on the properties of objective function /(•) and algorithm (|4.ip 
are contained in the next two theorems. 

Theorem 4.1. Let Assumptions [^T7] and hold. Then, /(•) is analytic on 
entire M."^" , i.e., it satisfies Assumvtions [K3 and \K^ 

Theorem 4.2. Let Assumptions [3l]\4.1\ and \4.2\ hold. Then, 

l|V/(0„)f = 0(7,;^), d{f{d^),C) = 0(7-^) 
w.p.l on {sup„>Q ll^rill < 00}. Moreover, the following is true: 

l|V/(f?„)f = 0(7-^), d(/(^„),C) = o(7-P), di9n,S) ^ 0(7-^) 
w.p.l on {sup„>Q ||6'„|| < 00} n {f > r}, and 

\\vf(en)f^o{j-p), d{f{en),c)^o{^~^), d(0„,^) = o(7-«) 

w.p.l on {sup„>o ||6'„|| < 00} n {f < r). 

The proofs are provided in Section [9l C, S,p,p, q and f are defined in Section [2l 
Assumption 14.11 is related to the neural network being trained. It covers some of 
the most popular feedforward architectures such as backpropagation networks with 
logistic activationf0 and radial basis function networks with Gaussian activation^. 

^ Since 

|1 + oxp(-z)|2 = 1 + cxp(-2Re(z)) -|- 2 exp(-Re(z)) cos(Im(2)) > 1 + cxp(-2Re(2)) 

when |Im(z)| < tt/2, complex-valued logistic function h{z) = (1 + oxp(— z))"^ is analytical on 
{z eC: d{z,R) < n/2}. Due to the same reason, max{|h(2)|, |/i'(2)|} < 1 on {z g C : d{z,R) < it/2}. 
^ Complex- valued Gaussian activation h(z) = (27r)~^/'^ cxp(— z^/2) is analytical on entire C. As 

(1 + \z\)exp{-z'^/2) < (1 + |Rc(2)| + |Im(z)|) cxp{-Rc'^ (z) /2 + W(2)/2) < 3e 

when |Im{z)| < 1, we have max{|/i(2)|, |fe'(2)|} < 3e on {z G C : d(z,M.) < 1}. 
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On the other side, Assumption 14. 21 corresponds to the training sequence {a;„, j/n}n>Oj 
and is quite common for the analysis of supervised learning. 

The asymptotic properties of supervised learning algorithms have been studied in 
a large number of papers (see [H], [l^ and references cited therein). Unfortunately, 
the available literature does not provide any information on the rate of convergence 
which can be verified for the feedforward networks with nonlinear activation functions. 
The main difficulty comes out of the fact that the existing results on the convergence 
rate of stochastic gradient search require the objective function to have an isolated 
minimum at which the Hessian is strictly positive definite. Since the objective function 
is highly nonlinear in the case of supervised learning algorithms, it is hard (if not 
impossible) to show even the existence of isolated minima, let alone the definiteness 
of the Hessian. As opposed to the existing results. Theorem 14.21 does not invoke any 
of these requirements and covers some of the most widely used feedforward neural 
networks. 

5. Example 2: Temporal DifTerence Learning. In this section, the results 
of Theorems 12.21 and 13.11 are illustrated by applying them to the analysis of temporal- 
difference learning algorithms. 

In order to explain temporal-difference learning and to define the corresponding 
algorithms, we use the following notation. iV > 1 is an integer, while X = {1, . . . , N}. 
{a^n}n>o is an <¥- valued Markov chain defined on a probability space (il, .7-", P), while 
{c{i)}i^x are real numbers. /3 £ (0, 1) is a constant, while 

Coo 
^/3"c(a;„) 

for i Q X. For each i £ X, Gg{i) is a real- valued differentiable function of e W^o , 
while 

/(0)-i lim E{g{x^) - Ge{x^) f 

Z n — 'OQ 

for 9 G W^o . With this notation, the problem of temporal-difference learning can be 
posed as the minimization of /(■) in a situation when only a realization of {x„}„>o 
is available. In this context, c{i) is considered as a cost of visiting state i, while g{i) 
is regarded to as a total discounted cost incurred by {a;„}„>o when {a;„}„>o starts 
from state i. Gg{-) is a parameterized approximation of g(-), while 9 is the parameter 
to be tuned through the process of temporal-difference learning. For more details on 
temporal-difference learning, see e.g., [3], [27], [29] and references cited therein. 
Function /(•) can be minimized by the following algorithm: 

0n+i = On +an{c{xn) 4- /3Ge„ (x„+i ) - Ge„(a;„))y„, (5.1) 

Un+l ^ PUn + H0^{Xn+l), U > 0. (5.2) 

In this recursion, {a„}„>o is a sequence of positive reals, while Hg{-) = VeGe(-). 
6*0 is an R'^''-valued random variable, which is defined on probability space (Vl,!F,P) 
and independent of {a;„}n>o. In the literature on reinforcement learning, recursion 
(|5.1|1 . (I5.2|l is known as TD{\) temporal-difference learning algorithm with a nonlinear 
function approximation, while Ge{-) is referred to as a function approximation, or just 
as an 'approximator.' 
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We analyze algorithm (|5.ip . (|5.2p under the following assumptions: 

Assumption 5.1. {a;„}„>o is geometrically ergodic. 

Assumption 5.2. For each i, Gg{i) is analytic in 6 on entire W^" . 

Our main results on the properties of /(•) and asymptotic behavior of the algo- 
rithm l|5.ip . (|5.2p are presented in the next two theorems. 

Theorem 5.1. Let Assumptions [57T\ and [X^ hold. Then, /(•) is analytic on 
entire M."^" , i.e., it satisfies Assumvtions [K3 and \K^ 

Theorem 5.2. Let Assumvtions [Kl[\5.1\ and \5.S\ hold. Then, 

||V/(0„)|P = 0(7,7^), d{f{9^), C) = o(7-P) 
w.p.l on {sup„>Q ll^nll < oo}. Moreover, the following is true: 

||V/((?„)f = o{^-^), d{fi9n),C) = o(7-P), d{en,S) = o(7-«) 
w.p.l on {sup„>Q ll^nll < oo} n {f > r}, and 

\Nf{On)r = o{j-p), d{fie,,),c) = o{j-p), d(0„,5) = o(7-^) 

w.p.l on {sup„>o ||6'„|| < oo} n {f < r}. 

The proofs are provided in Section [TOl C, S,p,p, q and f are defined in Section [2l 

Assumption 15. ll corresponds to the stability of Markov chain {xn}n>o- In this 
or similar form, it is involved in any result on the asymptotic behavior of temporal- 
difference learning. On the other side, Assumption 1 5 . 21 is related to the properties of 
Gg{-). It covers some of the most popular function approximations used in the area 
of reinforcement learning (e.g., polynomial approximations and feedforward neural 
networks with analytic activation functions; for details see |3], [57], p5]V 

Asymptotic properties of temporal-difference learning have been the subject of a 
number of papers (see [3], [27] and references cited therein). However, the available 
literature on reinforcement learning does not offer any information on the rate of 
convergence of the algorithm (|5.ip . (|5.2p in the case when Gg{-) is nonlinear in 9. 
Similarly as in the case of supervised learning, the main difficulty is caused by the 
fact that the existing results on the convergence rate of stochastic gradient search 
require /(•) to have an isolated minimum at which V^/(-) is strictly positive definite. 
Unless G'e(-) is linear in 9, /(■) is so complex that these requirements are practically 
impossible to show. On the other side. Theorem [STU does not impose any restriction on 
the topological properties of the minima of /(•), or on the values of V^/(-). Moreover, 
it can be applied to many temporal-difference learning algorithms met in practice. 

Regarding the results of this section, the following note is also in order. Using the 
arguments Theorems 14. II and 15.21 are based on, it is possible (at the cost of increasing 
significantly the amount of technical details) to generalize Theorems 15.11 and 15.21 to 
the case when {x„}„>o is a continuous state Markov chain, as well as to actor-critic 
learning algorithms proposed in '13]. 

6. Example 2: Identification of Linear Stochastic Dynamical Systems. 

In this section, the general results presented in Sections [2] and [3] arc applied to the 
asymptotic analysis of recursive prediction error algorithms for identification of linear 
stochastic dynamical systems. To avoid unnecessary technical details and complicated 
notation, only the identification of one dimensional ARMA models is considered here. 
However, it is straightforward to generalize the obtained results to any linear stochas- 
tic dynamical system. 
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In order to state the problem of recursive prediction error identification in ARMA 
models, we use the following notation. M and N are positive integers, while dg = 
M + N. For ai, . . . ,aA/ e M and 61, . . . , ^at £ M, let 

M N 

Ag{z) f^kZ'^^ Bg{z) = 1 + XI ^kZ^''^ 

k=l k=l 

where — [ai • • • bi ■ ■ ■ b^]'^ and z £ C (C denotes the set of complex numbers). 
Moreover, let 

e = {eeR'''' : Be{z) = ^ |z| > 1}. 

On the other side, {yn}n>o is a real- valued signal generated by the actual system 
(i.e., by the system being identified). For 6* G 0, {yfi}n>o is the output of the ARMA 
model 

AgM = Bgiq)en, n > 0, (6.1) 

where {e„}>o is a real- valued white noise and q^^ is the backward time-shift operator. 
{£n}n>o is the process generated by the recursion 

Be{q)ei ^ Ag{q)yn, n > 0, (6.2) 

while yf, = y„ ~ £„ and 

/(0) = i hm Eiietr). 

Z n — >oo 

Then, is a mean-square optimal estimate of t/„ given yo, . . . , yn~i (which the model 
(|6.ip can provide; see e.g., [18], [H]). Consequently, can be interpreted as the 
estimation error. 

The parametric identification in ARMA models can be defined as the following 
estimation problem: Given a realization of {yn}n>o, estimate the values of 9 for 
which the model (|6.ip provides the best approximation to the signal {y„}„>o- If 
the identification is based on the prediction error principle, the estimation problem 
reduces to the minimization of /(•) over Q. As the asymptotic value of the second 
moment of efj is rarely available analytically, /(•) is minimized by a stochastic gradient 
(or stochastic Newton) algorithm. Such an algorithm is defined by the following 
difference equations: 

= y-ii^M+l Sn - ■ ■ Sn-N+lV , (6.3) 

e„+i = yn+i - (fiOn, (6.4) 

ij^n+l ^ (t)n ~[lpn - ■■ Tpn-N+l]"^ Ao9n, (6.5) 
On+1 0n + CtniJn+l£n+l, U > 0. (6.6) 

In this recursion, {a„}„>o denotes a sequence of positive reals, while is a com- 
posite matrix defined as = [Ojvxm Inxn]- {yn}n>-M is a real-valued stochastic 
process defined on a probability space {n,T,P), while 6*0 G 6, Eq, . . . ,ei-N & K 
and ipo, . . . , ipi-N G ^'^'^ arc random variables defined on the same probability space. 
9o, Eq, . . . , Ei-N, tpo, ■ ■ • , ipi-N G K'''' represent the initial conditions of the algorithm 

((Ol) - (ESD- 
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In the hterature on system identification, recursion (|6.3p - (j6.6p is known as the 
recursive prediction error algorithm for ARMA models (for more details [TB] , [IH] and 
references cited therein). It usually involves a projection (or truncation) device which 
ensures that estimates {On}n>o remain in Q. However, in order to avoid unnecessary 
technical details and to keep the exposition as concise as possible, this aspect of 
algorithm (|6.3p - (|6.6|) is not discussed here. Instead, similarly as in [17] - [19], we 
state our asymptotic results (Theorem [62]) in a local form. 

Algorithm (|6.3p - (|6.6p is analyzed under the following assumptions: 
Assumption 6.1. There exist a positive integer L, a matrix A £ M^^^, a vec- 
tor b £ and R'" -valued stochastic processes {x„}„>_m, {wn}n>-M (defined on 
(ri, J-, P) ) such that the following holds: 

(i) Xn+i = Axn + Wn and Un = b^Xn for n > -M. 

(ii) The eigenvalues of A lie in {z G C : \z\ < 1}. 

(iii) {wn}n>-M ore i.i.d. and independent ofOo, xi^m, Eq? ■ • • ,£i~N, ipo, ■ ■ ■ ^ipi-N- 

(iv) E\\wot<^. 

Assumption 6.2. For any compact set Q C Q, 

supi; ((4 + ||^„r)/{ >„}) < c^, (6.7) 

ri>0 

where tq = mi{n > : 9n ^ Q} ■ 

Our main result on the analyticity of /(•) is contained in the next theorem. 
Theorem 6.1. Suppose that {yn}n>Q is a weakly stationary process such that 



^ |Cov(2/o,yri)| 



< oo. 



n=0 



Then, /(•) is analytic on entire Q, i.e., the following is true: For any compact set 
Q d O and any a G f{Q), there exist real numbers Sq ^, fJ-Q,a £ (lj2], i^q G (0,1], 
Mq a G [1, oo), Nq such that V2. !A) holds for all 9 £ Q and such that \2. S\) is satisfied 
for each 9 G Q fulfilling \ f{9) — a\ < Sq^a- 

In order to state our main result of the convergence rate of algorithm (|6.3p - (|6.6p , 
we use the following notation. A is the event defined by 



A = <^ sup ||6l„|| < oo, inf d{9„,dQ) > 

Ln>0 ">0 

A is the set of accumulation points of {0ri}n>o, while 

p^2-^d{A,dQ)lA, /-lim inf /((?„). 

n — ^oo 

Q is the random set defined as 

^ _ ^9 eR'^'' : d{9,A)pj , on A 

A, otherwise 

(5, fi, V are random quantities defined by (12. 4p on A and by (|2.5p otherwise. Random 
quantities p, q, r are defined by (|2.6p . With this notation, our main result on the 
convergence rate of algorithm (|6.3p - (|6.6p reads as follows. 

Theorem 6.2. Let A ssumvtions [^J[ \6.1\ and \ 6. g| hold. Then, 

||V/(0„)f = 0(7-^), d{f{9,,),C) = 0(7-^) 
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w.p.l on A. Moreover, the following is true: 



l|V/(0„)f 



w.p.l on An {f > r}, and 



||v/((?„)f -0(7-^), d(/(^^„),c) = 0(7,7^), d(0„,5) = o(7~«) 



w.p. 1 on A n {f < r} . 

The proofs are provided in Section [TlJ C and S" are defined in Section [2l 

Assumption 16. ll corresponds to the signal {yn\n>o- It is quite common for the 
asymptotic analysis of recursive identification algorithm (see e.g., [H Part I]) and 
cover all stable linear Markov models. Assumption 16.21 is related to the stability of 
subrecursion (|6.3p - (|6.5p and its output {en}>Oj {V'ii}n>o- In this or a similar form, 
Assumption 16.21 is involved in most of the asymptotic results on the recursive predic- 
tion error identification algorithms. E.g., 18, Theorems 4.1 - 4.3] (which are probably 
the most general and famous results of this kind) require sequence {(£«, '!/'n)}n>o to 
visit a fixed compact set infinitely often w.p.l on event A. When {yn\n>o is generated 
by a stable linear Markov system, such a requirement is practically equivalent to (|6.7p . 

Various aspects of recursive prediction error identification in linear stochastic 
dynamical systems have been the subject of numerous papers and books (see [18], [19] 
and references cited therein). Despite providing a deep insight into the asymptotic 
behavior of recursive prediction error identification algorithms, the available results 
do not offer information about the convergence rate which can be verified for models 
of a moderate or high order (e.g., M and N are three or above). The main difficulty 
is the same as in the case of supervised learning. The existing results on convergence 
rate of stochastic gradient search require /(■) to have an isolated minimum which is 
the limit of {6'„}„>o and at which V^/(-) is strictly positive definite. Unfortunately, 
/(■) is so complex (even for relatively small M and N) that these requirements are 
practically impossible to verify. Apparently, Theorem 16.21 relies on none of them. 

Regarding Theorems 16. II and l6.2[ it should be mentioned that these results can be 
generalized in several ways. E.g., it is straightforward to extend them to practically 
any stable multiple-input, multiple-output linear system. Moreover, it is possible to 
show that the results also hold for signals {yn}n>o satisfying mixing conditions of the 
type [m Condition SI, p. 169]. 

7. Proof of Theorems 12.11 and 12.21 In this section, the following notation is 
used. Let A be the event 



A = <^ sup \\Bn\\ < 00 

Ln>0 





For e G (0, 00), let 



(f>{w) 



+ 



S. 



For 6* e K''^ let 



<0) 



( 



0, 



/)-i/p, if/(0)>/ 



otherwise 
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{p is introduced in Section [2]). On the other side, for < n < k, let Un,n = Oj 
and 



fc-i 



Un,k 



i—n 

-(V/(^?„)r5]a.(V/(0,)-V/(0„)), 



fe-i 



Then, it is straightforward to show 

f{Ok) - fi9n) = - (7fe - 7n)l|V/(0„)f - (V/(^?„))^U„,fc + 



(7.1) 



for < n < A:. 

Regarding the notation, the foUowing note is also in order: " symbol is used for 
locally defined quantities, i.e., for a quantity whose definition holds only in the proof 
where such a quantity appears. 

Lemma 7.1. Let Assumvtions \2.1 \ and hold. Then, there exists an event 
No eT such that P(iVo) = and 



limsup7^ max ||Mn,fc|| < w < oo 

n— ►oo n<k<a(n,l) 



(7.2) 



on A \ Nq. 

Proof. It is straightforward to verify 



fe-l / i \ fe-l 

Z— 71 



ioT < n < k. Consequently, 

ik„,.ii<f7r + E(^r-7r+i) 

for < n < A:. Thus, 





3 




i 


max 

/ n<j<k 


i—n 


= 7„ max 

n<j<k 


■i—n 



ln\\un,k\\ < ^max 

n<j <a(n,l) 



for < n < fc < a(n, 1). Then, the lemma's assertion directly follows from Assumption 

o □ 

Lemma 7.2. Suppose that A s sumptions \ 2. 1\ - \2.3\ hold. Moreover, let e € (0, oo) 
be an arbitrary positive real number. Then, there exist random quantities Ci, t (which 
are deterministic functions of C ; C is defined in Section\^ and a non-negative integer- 
valued random variable such that 1 < C < oo, 0<t<l,0<cre<oo everywhere 
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and such that 

max _ if {0k) ~ /(0„)) <j-P/^\\Vfi9n)\\Mw) + C,j-^p/HMw))\ (7.3) 

n<fc<a(n,i) 

/(^^a(„,£)) - fiOn) < - £||v/(0„)ll V2 + %^^n^f{eMeH + c\%'p^^{Mw))^ 

(7.4) 

on A \ Nq for n > (jl is introduced in Section\^. 
Proof Let Ci = 12C^exp{2C), t = 1/(4(71), while 

CTi = max ({n > : 6,, ^ Q} U {0}) , 
CT2 = max ({n > : a„ > i/3} U {0}) , 

ag.e-maxf |n>0: max \\un,k\\ > ln^^^4>e{w)\ U {0}] 

y n</i:<a(n,l) J y 

and (Tj = max{(7i, (72, (T3_£}/A\Ary . Then, it is obvious that is well-defined. On the 
other side, Lemma \7A] yields 



limsup7^/'^ max |litn,fc|| = limsup7^ max ||un,fe|| ^ w < (/'e(w) 

n^oo n<k<a(n,l) ' n^oo n<k<a{n,l) 

on (A \ iVo) n {r > r} (notice that if r < J^, then p/ p, = r and (peiw) > w -\- e > w) and 
limsup7^^'' max ||wn,fc|| = limsup7jj^'^^'"u' = < (pdw) 

n^oc 7i<k<a(n,l) ' ri^oo 

on (A \ iVo) n {?" < r} (notice that if r > f, then p/ fi = f < r and cj)e{w) > £ > 0). 
Thus, < < oo everywhere. Moreover, we have 

max \\u„,k\\<-/n''^^Mw)^ (7-5) 

n<fc<a(n, 1) 

i > lainl) - 7n = 7a(„,i) + l " 7« - > 2^/3 (7.6) 

on A \ A'o for n> a^- On the other side, (|7.5p yields 

||V/(0,)|| <||V/(e„)|| + ||V/(0,) - V/(0„)|| 

<||V/(0„)|| + C||^?fc-0„|| 
fc-1 

<||V/(0„)|| + a,||V/(&0|| + C\\un,k\\ 

i—7i 

k-1 

<|lV/(ft„)I| + Cj-P/^Mw) + a.||V/(0,)I| 

i—n 

on A for < n < k. Then, Bellman-Gronwall inequality implies 

||V/(0fe)|| < {\\Vfi9„)\\+C-/-P/f^M^)) exp (C(7a(n,l) -7n) 

<CeMC) {\\Vfi9„)\\+i-P^^M^)) 
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on A \ for a£ < n < A; < a{n, 1) (notice that Ja{n,i) — In < !)■ Consequently, (|7.5p 
gives 

fc-i 

\\0k-On\\ <^a»||V/(&OII + hn,fe|| 

i—n 

<CeMC) (||V/(0„)|| +7-^/^0.(^«)) {lk-ln) + l-^^^Mw) 
<2CeMC) ((7fe - 7n)l|V/(0„)|| + -f-^/^M^)) 

on A \ Nq ior ae < n < k < a(n, 1). Therefore, 
fe-i 

Kk\ <c||v/(0„)llE"»ll^»-^"ll 

i—n 

<2c^eMc) {hk-inf\\yf{o,,)f + i-^^^iik-inmf{eMeH) 
Kfel <c\\0k-ej' 

<4C^exp(2(7) ((7fe -^„)||V/(0„)I1 +7;7^/''0s(^«))' 
<8C3exp(2(7) ((7,-^„)2||v/(0„)f + ^-2p/A(^^(^))2^ 

on A \ iVo for (Te < n < A; < a{n, 1). Thus, 

< c, {hk-inn^fien)r+i-'^^^{Mw)r) (7.?) 

on A \ iVo for (Tg < n < A < a{n, 1). Since 

^1(7^: - In) < Cl(7,(„,t) - 7n) < C,i < 1/4 

for < n < A: < a{n, i) (due to dZ!])), and dTT]) yield 

f{Ok)- f{en)<-{lk-ln)[l'C,{^k-ln)) ||V/(0„)f 

+ 7,7^/''l|V/(0„)||(/.e(^«) + Cl7„-'^/''(0e(^«))' 

<-3(7fe-7n)||V/(0„)||V4 
+ 7-^/'^||V/(0„)||0e(u;) + C,-i-^^/>^{Uw)f (7.8) 

on A \ A''o for tJe < n < fc < a{n,i). As an immediate consequence of (|7.6p . (|7.8p . we 
get that dZSl), (173) hold on A \ TVo for n> a^. □ 

Lemma 7.3. Suppose that Assumvtions lKl] - \2.3\ hold. Then, lim„_+oo V/(0„) = 
onA\No. 

Proof. The lemma's assertion is proved by contradiction. We assume that 
limsup„^oo li^/(^n)ll > for some sample uj £ A \ No (notice that all formulas 
which follow in the proof correspond to this lu). Then, there exists a G (0, 00) 
and an increasing sequence {lk}k>o such that liminffe^oo ||V/(f?;^. )|| > a. Since 
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lim ini f{&a(ik,i)) ^ /' Lemma O (inequality ([73)) gives 

/-liminf/(0ij <limsup(/(0,(,^_,-)) J) 
<-(£/2)liminf||Vmjf 

A;— i-cso 

< - 0^1/2. 

Therefore, liminffe^oo /(^ifc) > / + /2. Consequently, there exist 6, c G M such 
that f<b<c<f + aP/2, b < f + S and hmsup„^oc /(^n) > c. Thus, there 
exist sequences {mk}k>o, {'T-fc}A:>o with the following properties: < < m^+i, 
/(film J < /(6'n J > c and 

max /(0„) > b (7.9) 

for fc > 0. Then, Lemma [TjH (inequality (|7.3p ) implies 

limsup(/(0„,+i) - /(0™J) < 0, (7.10) 
limsup max J/(^^„) - /(0„,J) < 0. (7.11) 

fe^oo mk<n<a(vak,i) 

Since 

& > /(^™J = /(^^™. + l) - (/(^?™. + l) - f{ern,)) > b - (/((?™, + l) - /(&„.J) 

forfc > 0, (dni yields limfc_.„o /(^m J = b. As /(^^„ J-/(0™ J > c-&forfc > 0, (Em) 
implies a{mk,t) < n^. for all, but infinitely many k (otherwise, liminffc^oo(/(^nfc) ^ 
f{dmk)) < would follow from (|7.1ip V Consequently, liminffe^oo /(6'a(m;.,t))) > b 
(due to (17.9^ 1. while Lemma f inequality (|7.4p ') gives 

< limsup /(0,(,^^^,-)) - =limsup(/((?,(„^ ,-)) - /(0,„J) 

A;— *oc k^oc 

<-(£/2)liminf||V/(0„J||2. 

Therefore, hmfe^oo ||V/(^^„lJ^)|| = 0. Thus, there exists fco > such that 9^'^ e Q 
and /(6I™J > (/ + b)/2 for fc > fco (notice that limfc_,oo fiOm,) ^ b > {f + b)/2). 
Consequently, 6rak £ Q and < {b — f)/2 < ,f{0„i^) — / < (5 for fc > fco (notice that 
f{Omk) < < / + (5 for fc > 0). Then, owing to ((2J)) (i.e., to Assumption [33)1 . we 
have 

< (6 - /)/2 < fiOm,) -f< A#||V/(^?,„J|r 

for k > fco. However, this directly contradicts the fact limfc^oo ll^/(^r)ifc)|| — 0. 
Hence, lim„^oo V/(6'„) = on A \ Nq. □ 

Lemma 7.4. Suppose that Assumvtions \2. 1\ - \2.3\ hold. Then, lim„^oo fi&n) — f 
on A\ Nq. 

Proof. We use contradiction to prove the lemma's assertion: Suppose that / < 
limsup„^3o f{On) for some sample lu e A\ A'o (notice that all formulas which follow in 
the proof correspond to this uj). Then, there exists a G M such that f < a < f + S and 
limsup„^Q^ /(0„) > a. Thus, there exists an increasing sequence {nk}k>Q such that 
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f{9nk) < o- and /(0„j^+i) > a for A: > 0. On the other side, Lemma [721 (inequality 
(j7.3p ') implies 

limsup(/(^^„,+l)-/(0„J) <0. (7.12) 

Since 

a > fiOrJ = - (/(^„.+i) - fiOnJ) > a - - /(0„J) 

for fc > 0, (|7.12p yields limfe^oo fi^nk) = Consequently, there exists fco > such 
that G Q and /(6'„^) > (/ + a)/2 for k > ko (notice that limfe^oo f{Onk) = a > 
(/ + a)/2). Thus, e Q and < {a-f)/2 < /(6I„J-/ < (5 for A: > /cq (notice that 
/(^nj < a < / + (5 for /c > 0). Then, due to ((27)) (i.e., to Assumption [2?3)) . we have 

< (a - /)/2 < /(0„J - / < M||V/(0„jf 

for /c > ko. However, this directly contradicts the fact lim„^oo V/(6'„) = 0. Hence, 
lim„_,oo/(0„) = /on A\7Vo. □ 

Lemma 7.5. Suppose that A ssumvtions [KT\ - \2.3\ hold. Moreover, let e G (0, oo) 
he an arbitrary positive real number. Then, there exist random quantities C2, C3 
(which are deterministic Junctions of r, C, M) and a non-negative integer-valued 
random variable such that 1 < C2,C3 < 00, Q < < 00 everywhere and such that 
the following is true: 

("(^a(n,t)) - <0n) + i || V/((?„) |1 < 0, (7.13) 

(u(0,(„,,-)) - u{6r,) + (f/Cs) «(0„)) Ib,._, < 0, (7.14) 
(^^(^a(„.t)) - «(^") - {ilC^){Uy^))-^'^) Ic,^_^ > (7.15) 
on A\ No for n> t^, where 

= {itHOr,)] > C2mw)f] u {it\\vf{e^)r > C2{M^)f}, 

Bn,e = {iMOn) > C2{Mw) f} H {A = 2}, 

C„,e = {Yn<e,a) > ^2 ((/., (l.) )'^ } H { l/(0,(„,,-) ) > o} H {A < 2} . 

Remark. Inequalities ^7.13^ - J?. J5p can &e represented in the following equiv- 
alent form: Relations 

(7^|n(0„)| > C2(</'e(ii'))'' V 7,^I|V/(0„)|P > C'2(</'e(w^))'') A n > 

^ ii(e,(„,£)) < ii(e„) - £||V/(0„)ll V4, (7.16) 
7P«((?„) > C2((/.e(w))'' A A = 2 A n > r, 

=^ "(^a(n,t)) < (l - ^74) U{e^), (7.17) 

7,>(^«) > C'2(0e(u;))'' A u(0,(„,£)) > A A < 2 A 71 > 

^ <^a(„,t)) > «(^") + (<7C'3)(</'e(^))-''/'^ (7.18) 
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are true on A\ Nq . 

Proof. Let C = 8Cl^'^/i, C2 = C^M and C3 = 8Af ^ max{l, r}, while 

n = max (|n > : 6'„ ^ q} U {0}) , 
fa = max > : |w(6'„)| > U {0}) , 

f3,e - max ({n > : ^-P^\Mw)r^' < In'^^M^)} U {0}) (7.19) 

and Te = maxjCTe, fi, f2, fa gj/A^^i, . Obviously, is well-defined. On the other side, 
Lemmas 17. 3[ 17.5] imply < < 00 everywhere (in order to conclude that fa is finite, 
notice that lim„^co u{On) = on A \ TVq; in order to deduce that r^^f, is finite, notice 
that p/2 < p/ fjb when /i < 2, and that the left and right hand sides of the inequality 
in (|7.19p are equal when fi = 2). Moreover, we have 

j-P^HM^)r^' > ln^''^Uw) (7.20) 
on A \ A^'o for n > t^. Since > cr^ on A \ Nq, Lemma [TjH (inequality (|7.4p ) yields 

«(f?aK£)) -«(^n) < -t||V/(0„)||V2 + 7„-^/''l|V/(e„)||0e(«^) +C'l7„-'^/''('^s(^«))' 

(7.21) 

on A \ A^o for n > t^. As On G Q and |u(6'„)| < (5 on A \ A^o for n > r^, ([27]) (i.e.. 
Assumption 12.3( 1 implies 

\u{9n)\<Amf{0n)r (7.22) 

on A \ A/'o for n > t^. 

Let Lo be an arbitrary sample from A \ A'o (notice that all formulas which follow in 
the proof correspond to this u). First, we show (|7.13p . We proceed by contradiction: 
Suppose that (|7.13p is violated for some n > t^. Therefore, 

u{e,(n,i)) - uiOn) > -i||V/((?„)f /4 (7.23) 
and at least one of the following two inequalities is true: 

\u{en)\>C2Mjn^{^e{w)r, (7.24) 

\\'^f{0nW>C2j~^{M^)r- (7.25) 
If ([TTM)) holds, then (HI^ imphes 

l|v/(0„)|| > {\u{en)\/My/^ > {C2/Mf'^l-^^^Uw) > c^-^'^Uw) 

(notice that (Ca/Af)^/'' > (C'a/M)^/^ = C owing to /i < 2). On the other side, if 
dTTlSl) is satisfied, then (fT^ yields 

l|v/(&„)|| > cl'^^-^/\u^)Y'^ > C^n^'Heiw). 

Thus, as a result of one of (fTMl) . (fr25| . we get 

iiv/(0„)ii>c7-^/^0a«^). 
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Consequently, 

(notice that Ci/8 ^ C\l'^ > 1, CH/8 ^ 8Ci/i > Ci). Combining this with dZHl), we 
get 

U{0ain,i)) - <0n) < || V/((?„) || 2/4, (7.26) 

which directly contradicts (|7.23[) . Hence, (|7.13p is true for n > t^. Then, as a result 
of (|7.22p and the fact that Bn.e Q An^^ for > 0, we get 

("(^a(„,t)) - <e.n) + (t/Ca) U(0„)) /s„., 

< ("(^a(„,t)) - ^iOn) + (Mi/C,) ||V/(0„)f ) /b„,, 

< {^((^ainJ)) - u{e„)+i\\yf{OnW/^) Ib^.. < 



for n > (notice that u{dn) > on Bn.e for each n > 0; also notice that C3 > AM). 
Thus, (|7.14p is true for n> t^. 

Now, let us prove (|7.15p . To do so, we again use contradiction: Suppose that 
()7.14|) does not hold for some n > r^. Consequently, we have /t < 2, u(6'a(nt)) > ^ 
and 

jt^iSn)>C2{cbeHf>0, (7.27) 

viOain^i)) - < {t / C^) (M^))- . (7.28) 

Combining (|7.27p with (already proved) (|7.13p . we get (|7.26p . while /t < 2 implies 

2/fi^l + l/ifif) <l + l/p (7.29) 

(notice that f = 1/(2 — ji) owing to /i < 2; also notice that p = (lm.in{r, f} < /if). As 
< uiOn) < ^ < 1 (due to ((7??7|) and the definition of r^), inequalities (fT^ . ((7?^ 
yield 

||V/(0„)f > (^.(^.O/m)'^'' > {u{en) f+^'^ IM^ (7.30) 

(notice that M^/A < £p due to < 2, Af > 1). Since ||V/(6I„)|| > and < 
"(^a(n,t)) < "(^'rO (duc to ([LSSI) , dLSSl) , dLS!])), inequalities ([7^ . ([7:301) give 

t m(^») -"(^a(«.t)) - a "^^") ~"(^a(n,f)) 
4- ||V/(0„)P - («(0n))'+^ 



("(^«))'+^ 



a(„,t)J 



--pM^ («(V,£))-t'(^?«)). 
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Therefore, 

^{Oain,i)) - V{9n) > t / {ApM^) > {t / C3) 

(notice that p < r, C3 > ArlVP), which directly contradicts I^L2E^ . Thus, ((TTSl is 
satisfied for n > t^. □ 

Lemma 7.6. Suppose that Assumvtions [K7\ ~ \2.3\ hold. Moreover, let e £ (0, cx)) 
be an arbitrary positive real number. Then, 

u{en) > ~C2j-p{M^) f (7.31) 

on A\Nq for n > t^. Furthermore, there exists a random quantity C4 G [1, 00) (which 
is a deterministic function of r, C , M ) such that 1 < C4 < 00 everywhere and such 
that 

||V/(0„)f < (74 (^(u(0„)) +7,;^^(</>,H)^) (7.32) 

on K\Nq for n > t^, where function </?(•) is defined by ip{x) = x I(o,oo)(a^)7 x G M. 

Proof. Let C4 = AC2 / i, while w is an arbitrary sample from A \ A''o (notice that 
all formulas which follow in the proof correspond to this lu). 

First, we prove (|7.3ip . To do so, we use contradiction: Assume that (|7.3ip is not 
satisfied for some n > r^. Define {nk}k>Q recursively by no = n and — a{nk~i,t) 
for k > I. Let us show by induction that {u{9n^)}k>Q is non-increasing: Suppose that 
u{Oni) — ""(^"i-i) foi' < I < k. Consequently, 

uie,,,) < j.(0„„) < -C2j-,nMw)f < -C2i-!mw)Y 

(notice that {7„}„>o is increasing). Then, Lemma [7?5] (relations (|7.13p . (|7.16p ) yields 

w(0„,^J - u(0„J < -f||V/((?„Jf /4 < 0, 
i.e., u{9nk+i) < u{9nk)- Thus, {u{6n^)}k>Q is non-increasing. Therefore, 

limsupu(6'„J < u{9no) < 0- 

n — >oo 

However, this is not possible, as lim„_»oo u(0„) = (due to Lemma Fr4|) . Hence, (I7.3ip 
indeed holds for n > r^. 

Now, (|7.32p is demonstrated. Again, we proceed by contradiction: Suppose that 
l|7.32p is violated for some n > t^. Consequently, 

I|v/((?„)IP > c,^~H<l>eHf > C2i-^{Uw)f 

(notice that 6*4 > C2), which, together with Lemma [7.51 (relations (|7.13p . (|7.16p ). 
yields 

«(^a(n,t))-"(^«)<-t||V/(0„)f/4. 

Then, ((73T|) imphes 

||V/(0„)f <(4/£)(u(0„)-^.(^„(„^,-))) 
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However, this directly contradicts our assumption that n violates (I7.32p . Thus, (|7.32p 
is indeed satisfied for n > r^. □ 

Lemma 7.7. Suppose that Assumvtions - hold. Then, there exists a 
random quantity C'z (which is a deterministic function of r, C, M) such that 1 < 
< oo everywhere and such that 

liminf7Pu(0„) <C5(0Mr (7.33) 

n — >-C50 

onA\No. 

Proof Let C5 = (72 +C'|''. We prove ([7^ by contradiction: Assume that ((7?^ 
is violated for some sample uj from A \ Nq (notice that the formulas which follow in 
the proof correspond to this to). Consequently, there exist e G (0, cxd) and uq > 
such that 

"(^n) > C5j-P{Mw)f (7.34) 

for n > uq. Let {nk}k>o be defined recursively by Uk = a{nk-i,t) for fc > 1. In what 
follows in the proof, we consider separately the cases fi < 2 and fi = 2. 
Case fi < 2: Due to (|7.34p . we have 

(notice that p < 2r). On the other side. Lemma [7.51 (relations (I7.15p . (|7.18p ) and 
(fTMll yield 

Vier.,^j~v{9rj > {i/CsmH)-f^^^ > (1/C3)(7„. + , -7nJ(0e(«^))-''/^ 

for fc > (notice that C5 > C2; also notice that t > 'juk+i ~ Juk)- Therefore, 

fc-i 

(1/C3)(7n. -7no)(0e(u;))-^/^^ <5](z;(^„, + J -«(0„J) 

1=0 

=w(6'„J - v{9n„) 

for fc > 1. Thus, 

(l-7no/7nJ<C'34"'^^'''^ 

for fc > 1. However, this is impossible, since the limit process fc — > 00 (applied to the 
previous relation) yields 6*3 > C'g^'^^'' (notice that > Cf''). Hence, (|7.33p holds on 
A \ A^o when fi <2. 

Case fi = 2: As a resuh of Lemma [731 (relations ([7H)l . ([7T7)) ') and ([731) . we 

get 

Ui9n,^j < (1 - i/C3)u{9n,) < (l - (7„, + , - 7nJ/C'3) «(^«J 

for fc > 0. Consequently, 



1=1 

<u(6'„Jexp -(1/C'3)^(7n, -7n.-i) 
\ i=l / 

='u(6'„J exp (-(7„, - 7«o)/C'3) 
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for A: > 0. Then, ([73^1) yields 

for fc > 0. However, this is not possible, as the limit process k ^ oo (applied to the 
previous relation) implies C5{(j)e{w))^ < 0. Thus, (|7.33p holds on A \ iVo also when 
A = 2. □ 

Lemma 7.8. Suppose that Assumvtions [KT\ - \2.3\ hold. Then, there exists a 
random quantity Cq (which is a deterministic function of r, C, M) such that 1 < 
Cq < oo everywhere and such that 

limsup7^ uie^) < Ce{<PHr (7-35) 

n — 'oo 

on A\ Nq. 

Proof Let Ci ^ Ci + C4 + C5, C2 = 6C1C2 + Cf and Ce = 2(C'i + €2?. We 
use contradiction to show (|7.35p : Suppose that (|7.35p is violated for some sample 
u) from A \ A'o (notice that the formulas which appear in the proof correspond to 
this w). Then, it can be deduced from Lemma FfTfl that there exist e G (0, 00) and 
no > mo > Te such that 

7^o"(^™o) < C2{<^e(w))\ (7.36) 
Yn„u{0no)>Ce{M^)r, (7.37) 

min jt^{0n)>C2mw)r, (7.38) 

mo <Ti<no 

max jP u{9,,) <Ce{Mw)r (7-39) 

mo <n<no 

(notice that C2 > Ci > C^) and such that 

iMmo,i)hn^of < inin{2, (1 - t/Cs)-'}, (7-40) 

i;;.f^^m^)r <%ntiM^)f (7.41) 

(to see that (j7.40p holds for all, but finitely many mo, notice that lim„^oo la{n t)Mn — 
1; to conclude that (|7.4ip is true for all, but finitely many mo, notice that 2p/ jl > p 
if /i < 2 and that the left and right-hand sides of (|7.4ip are equal when fi — 2). 

Let l() — a(mo, i). As a direct consequence of Lemmas l7.21 frelations (|7.3p . (|7.32p ) 
and (fTiTj) . we get 

w(^„) - uiOmo) <7™f 1|v/(0™J||0e(^«) + Ca^nf^^iMy^))' 

<||V/(e„J||V2 + ((7i + l/2)^~f/^Mw)? 
<C, v{u{9„J) + (Ci +C, + l)j-P{Mw)r 

<Ci {^{u{d„,J) + i;;i{Mw)r) (7.42) 

for mo < n < /o (notice that C1+C4 + I <Ci). Then, 1^^, ^M), (|7:^ yield 

u{9rao) + CM^O^a)) >^(^?m„ + l) " Ci7„f (<^s (^«) )^ 

> {C2{lmo + lhrrJ-^ ~ ^l) iV^U^)? 

>{C2l2-Cr)iZ^U^)f>^ (7.43) 
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(notice that {'^ma+i/lmaY < {liohmoY < 2; also notice that C2/2 > 3Ci), while 
(EMU, ^M, imply 

w(0„) <(1 + + Ci7„f (^/-eH)^ 

<(Ci+C'2 + CiC'2)74(0sHr 
<(Q/2)(7„/7™o)^„''^(0e(«^))'' 

<C67-^(0.(^«))'' (7.44) 

for mo < n < Iq (notice that (7n/7mo)^ ^ {liallmoY < 2 for toq <n<li^\ also notice 
that (76/2 = (Ci + (52)2 > Ci + 6*2 + C'iC'2). Due to (17:771) . (17:^ . (HMD, we have 
Zo < n^- On the other side, as x + Ci^p{x) > only if a; > and x+Ci(p{x) = {1 + Ci)x 
for a; > 0, inequality (I7.43P implies 

uie,no) >(1 + Ci)-i(C2/2 - C\)^;ZiMw)f > C27™f (0.(«^))'' (7.45) 

(notice that C2/2 - Ci > (71(3(72 - 1) > 2C1C2 > (1 + (7i)(72). 

In what follows in the proof, we consider separately the cases /i < 2 and fx = 2. 
Case /t < 2: Owing to Lemma [73] (relations (fTTSl) . (fTTS)) ) and ((7361) . ll735)) . we 

have 

v{ei,)>v{e„,,) + {iic^){U^))-^'^ 

>min{(72-'/^(73-i}7io(0e(u;))-'^/P 
(notice that i > ji^ — 7™^; also notice C2 < C2 < (7;^""^). Consequently, 

However, this directly contradicts (|7.38p and the fact that Iq < uq. Thus, (|7.35p holds 
when fi <2. 

Case fL = 2: Using Lemma [731 (relations ([7Tl[l . ([7TT71) ) and ([735]) . we get 
u(^;o) < (l-i7(?3)u(0„J. 

Then, ([7361) . ([730[l yield 

However, this is impossible due to (|7.38p and the fact that Iq < uq. Hence, (|7.35p also 
in the case fi — 2. D 

Proof of Theorems 12. II and 12.21 Theorem l2.1l is an immediate consequence 
of Lemmas 17.21 17.31 To show Theorem 12.21 we use the following notations: K = 
((72 + (74 + (76)^ L = KN. Then, Lemmas [73| and [TJj imply 

limsup7^|w(^„)| < ((72 + Ce){^{w)r (7.46) 

n — 'oo 
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on A \ iVg. On the other side, Lemma [7.51 and (|7.46p yield 

hmsup7P||V/(0„)f fCQl^H)'' +C'4hmsup7^^(u(0„)) 

n — >oo n^oo 

<{C2 + Ci + Ce)H4'{w)r (7.47) 
on A \ iVo- Combining ([7^ . ((7^ with Assumption \IM we get 

Hm sup 7« d(0„ , ^) < TV Um sup (7P 1 1 V/ (0„ ) f ) 

<N{C2+Ci + Ce)\<j){w)Y (7.48) 

on A \ TVo- As a direct consequence of (fTISl) - ([7^ . we have that ((^ - (P?TU)) are 
satisfied on A \ A'q. Hence, Theorem 12.21 holds, too. □ 

8. Proof of Theorem 13. li The following notation is used in this section. For 
G R'^o, CeW^i, Ee,^i-) denotes E{-\eo ^6,^0^ 0- Moreover, let 

wi,n = F{en,^n+i) - (nF)(^?„,e„), 

W2,n = (nF)(0„,en) - (ni?)(0„_i,e„), 
W3,n = -(nF)(0„,e„ + l) 



for n > 1. Then, it is obvious that algorithm (j3.1|) admits the form (|2.1|) . while 
Assumption 13.21 yields 



k k k k 

ai-/^W, = a.'^lwi^, + Y OitllW2,i - X!("»Tr ~ Ot^+ll^+l)'^3,^ 
i—n i—n i—n i—n 

- afc+l7fc+iW3,fc + anJnW3,n-l (8.1) 

for 1 < n < k. 

Lemma 8.1. Let Assumvtion \3. 1\ hold. Then, there exists a real number s £ (0, 1) 
such that X]^o '^n^^Tn < 

Proof. Letp== (2 + 2r)/(2 + r), (7= (2 + 2r)/r, s = (2 + r)/(2 + 2r). Then, using 
the Holder inequality, we get 

oo oo , \ \lq / oo \ 1/P / oo \ 1/9 

n=0 n=l \ ^n/ \„^i / \„^i / 

Since 7„+i/7„ = 1 + an /in = 0(1) for n ^ oo and 

oo oo oo / V 2 I , J. -1 / \ 2 

^1 7^ £l 7^ £l V 7n y 71 »>o V 7n 



it is obvious that X]^o '^r!^'*7n converges. □ 

Proof of Theorem 13.11 Let Q C M''" be an arbitrary compact set, while 
s e (0, 1) is a real number such that X^^o '^^i^'^ln < Obviously, it is sufficient to 
show that Yl,n=Q(^nlnWn couvcrgcs w.p.l on nj^oi^n ^ Q)- 
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Due to Assumption 13.11 we have 

(a„_i - anhn = ("«^ - "n-i) (1 + an-iia-^ - a^^)) a\i'^ = 0(a\i'^), 
a«(7n+i - tD = "nTn ((1 + c^nllnY - 1) = a«7^ (ra«/7n + o(a„/7„)) = 0(0^7;) 
as n — *■ 00. Consequently, 

oc 

^<a„+i7;;+i < oo> (8-2) 

n=0 

00 00 00 

^ |a„7; - a„+i7;+i| < XI ~ + Yl l"» " l7^+i < 00. (8.3) 



Ti— n— n— 



On the other side, as a result of Assumption 13.31 we get 

£■9,5 (l|w'l,n|P^{rQ>«}) <2-Ee,^ (¥'Q^s(Cn+l)i"{rQ>n}) + 2£^e.e (v'q.s (^n)^{rQ >n-l}) , 

-Ee.c (Ik2,«|p/{TQ>«}) <£^e,c {vqAinWn - 6'„_i||''/{^q>„_i}) 

<<_l£^e,e (</'Q,s(Cn)/{rQ>n-l}) , 

for all 6* G M''«, C e K'^S « > 1- Then, Assumption O and jS^l) yield 

y]an7^lk2,n||^{rQ>n} < V<_ia„7; SUp Se,^ ((^^^ )/{ >„} ) 

for any 6* e R''<' , ^ e R''? , while ([13]) implies 

X l""7^ - an+i7rVil sup {Eb,^ ((^I ,(^„)/{^q>„}))^^^ < 00, 



< 00 



\n=l 



\n=l / "2:" 

for each e W'^" , £, e R'^'^ . Since 

EeX {wi^nI{rQ>n}\^n) = [Sg^^ , 1 ) l-^n) - (nF)(0„,^„)) /{.q>„} = 

w.p.l for every 9 e M'''', ^ G IR.''«, n > 1, it can be deduced easily that series 

CXD oo oo 



n— 1 n=l n— 1 
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converge w.p.l on Hj^oi^n ^ *5}' ^^^^ ^'^^^ lim„^oo anT^w^a.n-i = w.p.l on 
the same event. Owing to this and (|8.ip . we have that J2'^=o Q^n7n^n converges w.p.l 
on aTLol^" e Q}- ° 

9. Proof of Theorems 14.11 and 14.21 In this section, we use the following 
notation. For 6* £ M'*^ x £ M"*- , ?/ e M, ^ = (x, y), let 

F{0,O = iy-Geix))He{x), 



while = ixn,yn) for n > 0. With this notation, it is obvious that algorithm (|4.1 
admits the form (|3.ip . 

Proof of Theorem 14.11 Let 9 ^ [a[ - ■■ a'^^ a" ^ • ■ • a^^^^J^ e M'''' , while 



2ii:LAri7V2(l 



and Ug = {ri G C^" : H?? — d\\ < Sg} {e is specified in Assumption 14. ip . Moreover, for 
v=[b'i--- b'j,^ b'l, • • • b%^^j,f e C'^^ X e R'^^ let 

/ Afl / ^2 \ \ 

\41 = 1 \42 = 1 // 



Then, we have 



JV2 



N2 



i2 = l 



i2=l 



N2 



for all ?7 = [6'^ • • • 6^^ • • • ^iVi,A'2]^ G ^e, 1 < n < and each x € R"*- satisfying 
maxi<fe<Ar2 |^/'fc(a::)| < L. Consequently, Assumption 14.11 implies 



n=i \i2=i / ii=i \i2=i y 



< E i^n - «^ 
EK 



^2 ^E ^i'l.JsV'^aW^ 



11=1 



N2 



N2 



^42 = 1 



<5gKNi+KY,Wn 



n=i 



Af2 



^»2 = 1 
^2 



E K,^2'^^2{x) - E a^l,^2^i2(2^) 



i2=l 



< SgKNi + (5eifL7ViiV2||6i|| < e 



for any = [6i • • • h'^^ b'( j ■ ■ ■ b'^^ j^^]^ G IJg and each x € R''" satisfying maxi<fc<Ar2 \ipk (x) \ < 
L. Then, it can be deduced that for all x G M.'^'^ satisfying maxi<fc<jv2 IV^fcCa^)! < L, 
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G^{x) is analytical in rj on Ug. Moreover, Assumption 14.11 yields 



Ni 



\G,ix)\<K[l+J2\b[^\ 



^2 {t.KM^)^ 



d 



db 



8h" 



<^'(i + hll), 

\i2=l 



VJl = l 

( N2 



\12 = 1 



^2 (T.^'L^2^^^i^)]bkbl,k2^k2{x) 

\i2 = l 

<K^L\\r]"'^ 



for all 77 = [b[ • • • 6^^ b'{ ^ ■ ■■b'l^^j^J'^ € J7e, 1 < fci < A^'i, 1 < fc2 < iVa and each 
X G M."^" satisfying ma.xi<k<N2 IV^fe(a;)| < L. Therefore, 

||V^G„(a;)|| <K^LNiN2il + \M)^ 

for any rj E Ug and each a; G M''^ satisfying maxi<fe<jv2 ||V'fc(a^)| < Thus, 

||V„(y - G„(x))2|| = 2Kj - G„(x)|||V,G,(a;)|| < 4K^L'N,N2{1 + 

for all ?] e C/e and each x G M''^, y G M satisfying ma,xi<k<N2 Wi'kix)] < L, \y\ < L. 
Then, the dominated convergence theorem and Assumption 14.21 imply that /(•) is 
differentiable on Ug. Consequently, /(■) is analytical on Ug. Since f{6) — f{9) for all 
G W^" , we conclude that /(•) is real-analytic on entire M'*". □ 

Proof of Theorem 14.21 As {^„}„>o can be interpreted as a Markov chain 
whose transition kernel does not depend on {9n}n>o, it is straightforward to show 
that Assumptions 13.21 and 13.31 hold. The theorem's assertion then follows directly 
from Theorem 13. II □ 



10. Proof of Theorems 15.11 and 15.21 In this section, we rely on the following 
notation. For n > 0, let ^n+i = (xn, Xn+i,yn), while 



for 9,y £ 



ode 



F{e,0 ^ -(cit) + pGeU) - Geit))y 
,j £ X and f = {i,j,y). Moreover, let 



Ilgi{z,j,y),{i',f) X B) =P(a G (*',/) xB\Co = 

+ Hg{j))P{x, = j'\xo - 

for 9,y e R'^o, B G B*^" , i,i',j,j' G X. Then, it is straightforward to verify that 
recursion (|5.1[) , (|5.2[) admits the form of the algorithm studied in Section [3l 

The following notation is also used in this section, e is an A'^-dimensional column 
vector whose all components are one. For 1 < i < A'^, = [e-i^i ■ ■ ■ Ci^N]"^ is an 
A^-dimensional column vector such that Ci^i = 1 and Ci^k = for k ^ i. P and 
TT denote (respectively) the transition probability matrix and the invariant column 
probability vector of {a::„}„>o (notice that j.i entry of P is P{xi = j\xo = i)). 
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Furthermore c = [c(l) • • • c{N)] and g = cJ2'^^o Z^"^"' while Ge = [Ge(l) ■ • • Gs{N)], 
Gg = c + pGeP - Go and Hg = • ■ • Hg{N)] for e M^" (notice that c, g, Gg, 

Gg are row vectors) . 

Lemma 10.1. Let A ssumption \5.1\ and \5.2i hold. Then, there exists a real number 
£ G (0, 1) and for any compact set Q C R'^" , there exists another real number Cq € 
[1, oo) such that 



m-F){e,0 - ^fm <CQne-{l + \\y\\), 

II {{n-F){9',0 - V/(0')) - {{U"F){e",0 V/(0")) II < CQne"\\d' 6"\\{1 + \\y\\), 



for aUe,d',0" &Q,y€R'^%i,j ex and ^ ^ {i,j,y)- 

Proof. Let Q C K'^" be an arbitrary compact set, while e G (0, 1), G £ [1, oo) are 
real numbers such that e > max{l/2,/3}, ||P"|| < C and 



for n > (the existence of e,G is ensured by Assumption 15. ip . Moreover, Ci^q S 
[l,oo) denotes an upper bound of \\Gg\\, \\Gg\\, \\Hg\\ on Q, while 6*2, q G [l,oo) is a 



E (||yn||'/{r,>n}|eO = , ^0 = ^) < Cq{1 + \\y\\)^ 



(10.1) 



IIP" -^e^ II <Ce 





k=0 

n-1 



- P'^yGgP^-'ej - P'^Hg dmg{GgP'')P 




TT 



Y,f3'HgdmgiGgP''){P 



m — k—l 



for 6,y e M'^^ hj ^ ^ and <f = {i,j,y). Therefore, 



m"F){0,o-^fm\ 



oo n — 1 



< G(7i,Q/3" ||y|| + CCIq + ^'^Iq E f^'^'' 



k—n k—0 



<CQn£"(l + ||y||) 
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for alio e Q, y e M.'^" , i,jeX,n>0 and ^ = y). Moreover, 

II {{ii^F){e',o - wf{9')) - {{n^F){9",o - v/(0")) || 

<Cf3"\\y\\\\Gg, - Gg„\\+CC,,Q{\\Gg,-Gg„\\ 

oo n — 1 

+ \\Hg, - Hg4) Y.I3' + C'C^,Q{\\Gg, - Gg„\\ + \\He, - Hg.W) J2 

k—7i k—0 

<CQne^0' ~0"\\{l + \\y\\) 

for any 9', 9" e Q, y e M'^", i,jeX,n>0 and ^ = {i,j,y). On the other side, we 
have 

\\yn+l\\I{TQ>n+l} < P\\yn\\I{TQ>n} + C\.Q 

for n > 0. Consequently, 

n-l 

/3||y„||/{.«>„} < boll + Ci,Q E ^ '(1 + llyoll) 

k=0 

for n > 0, wherefrom (|10.1|) immediately follows. □ 
Proof of Theorem 15.11 Since 

for each 6* G W^" (7r(?) is the i-th component of tt). Assumption 15.21 implies that /(•) 
is analytic on entire W^" . □ 

Proof of Theorem 15.21 Using Lemma \10A\ it can be concluded easily that 
Assumption 13.21 and 13.31 hold. Then, the theorem's assertion directly follows from 
Theorem O □ 

11. Proof of Theorems 16.11 and 16. 2i In this section, we use the following 
notation. For n > 0, let 

Zn = [X^ y„ ■ • ■ yn-M+lf, in = kT £« -0,1 ■• ' £n-N+l i^n-N+lV^ 

while = L + M + N{dg + 1). For 6* e 6, let eg = • • ■ = E-n+i = ipl = ■ ■ ■ = 
ip-N+i — 0, while {e^}„>o, {V'n}n>o arc defined by the following recursion: 

(t>i-l = [yn-l ■ ■ ■ Vn^M £n-i ■ ' ' £^-Ar]'^, 

si^yn-iK-ifO, 

C = el {^If ■ ■ ■ {€^N+iff, n>l. 

Then, it is straightforward to verify that {e^}Ti>o satisfies the recursion (|6.2p . as well 
as that ipfi — Vgefj for n > 0. Moreover, it can be deduced easily that there exist a 
matrix valued function Gg : Q ^ M''^ ^''^ and a matrix H G R'*':^^ with the following 
properties: 

(i) Gg is linear in 9 and its eigenvalues lie outside {zgC:|z|<1} for each 

6* G e. 
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(ii) Equations 



hold for all 6* e e, n > 0. 

The following notation is also used in this section. For 6* G 0, z G M^+^^, 
ui, . . . , e M, Vx,...,vn e K'*'' and ^ = [z^ ui vf ■ ■ ■ un vJj]'^, let 

while 

n9(f,B) = £;(/B(G9e + i??«o)) 

for a Borel-measurable set B from R'^s . Then, it can be deduced easily that recursion 
(|6.3|) - ()6.6p admits the form of the algorithm considered in Section [3l Furthermore, 
it can be shown that 

{U-cf>)(9,0)^E{{eif), (11.1) 
(n"F)(0,O) = i?(^f,efO = V,(n"0)(0,O) (11.2) 

for each G 0, n > 0. 

Proof of Theorem 16.11 Let m = E{ya) and = r_fc = Cov{yo,yk) for 
fc > 0, while 

oo 

A:— — oo 

for w G [— TT, tt]. Moreover, for G 0, z G C, let Ce{z) — Ag{z) / Bg{z), while 
a9 = l+ max |Ae(e^")|, /3e = min |Be(e'")|, Sg ^ 

we[-7r,7r] i.i)e[-7r,7r] iagag 

Obviously, 1 < ag < oo, < /39,(59 < oo (notice that the zeros of Bg{-) are outside 
{z G C : |z| < 1}). 

As J2kLo''^k < OO, \'p{-)\ is uniformly bounded. Consequently, the spectral theory 
for stationary processes (see e.g. [51. Chapter 2]) yields 

lim Eiei) ^ Cg{l)m, 

hm Cov(£:,e^+fe) = ^ r \Cg{en\M^)e''''du; 

for all 6* G O, fc > (notice that = Cg{q)yn and the poles of Cg{-) are in {z G C : 
\z\ > 1}). Therefore, 

1 

= ^J ^ \Cg{en\'vHdu; + \Cg{l)\'— (11.3) 
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for any 6 G @. On the other side, it is straightforward to veriiy 

d 



dak 



-i(jjk 



92 



dakiOak2 

QhA h/jv / \ 



= -{li+l2 + --- + InV- e-''^('i+2'2+-+^'^) 

^ \ ll+l2+---+lN+l 



for every 9 =[ai - ■ ■ Um 61 • • • &jv]"^ G O, w e [— tt, tt], 1 < fc, ki, k2 < M, li,. . . ,1^ > 0. 
Thus, 



gki-\ hfeM+h + ---ijv 



d^' ■ ■ ■ da'l^' db[' ■ ■ ■ db^;^ 



f)ki-\ \-kM 

-T ^M(^^) 



db[' ■ ■ ■ db'j^' \Bg{ei'^) 
< (Zi + ---Z;v)!ae(l//39y^+-'^+' 



for all 9 = [oi • • • um 61 • • • &jv]"^ € Q, lo € [— tt, tt], ki,. . . , ku > 0, . . ,In > 0. 
Then, it can be deduced easily 



for all 61 e e, w e [-tt, tt], fci, . . . , fc^, > [Df'-'^"' denotes d^^+'-'+^-^e /dd\' ■ ■ ■ dd^^l , 
where i?i is the i-th. component of 6). Since 



feiH hfcdfl+l 



fel *^<i6( 



for each 6 G Q, ui G [—• tt, tt], fci, . . . , fc^^ > 0, we have 



<(fci + --- + fcdj!(^^ 
<(A;i + --- + A;dJ!f^ 
< {ki + ■ ■ ■ + kdeV- 



ki + ---+kd„+2 fci 






•E 


31=0 




ki+---+kag+2 fci 




E- 


-E 






^^kl+■■■+kd|,+2 
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//^iH /t'c;^ \ 



for any 6* e 0, w G [— tt, tt], /ci, . . . , fc^/^ > 0. Consequently, the multinomial formula 
(see [lU Theorem 1.3.1]) implies 

ki=0 fcd„=0 



< 



n_u 0<fci,...,fed„<« 



fel+---fcd„ =n 



mm- 



for every G 6, w S [— tt, tt]. Then, the analyticity of /(•) directly follows from (|11.3p 
and the fact that \ is uniformly bounded (also notice that Ce(l) is analytic in 0). 
□ 

Proof of Theorem 16.21 It is straightforward to show 
max{||F(0,OIU(O}< lieil, 

max{||F(0,e') - F{e,ai m') ^ < m' ewiww + wew) 

for aU 6* e 6, C, C e R'*? . Moreover, it can be deduced easily that for any compact 
set Q C W'-", there exist real numbers (5i,q G (0,1), Ci,q G [1,oo) such that ||Gg|| < 
Ci,Q^i,Q and 



Ql 



for each 6*, 6*', 61" G Q, n > 0. Then, the results of gl Section II.2.3] imply that there 
exist a locally Lipschitz continuous function g : 8 R''" and a Borel-measurable 
function F : 6 x M'^« ^ E'^" such that 

F{e,o - g{0) ^ F{0,o - {'nF){e,o 

for every 6* G 0, ^ G R'^s. Due to the same results, there exists a locally Lipschitz 
continuous function /i : — > R and for any compact set Q C M''", there exist real 
numbers (52, q £ (0, 1), C2,q G [1, oo) such that 

max{||(n"F)(0,C) -g(0)||, |(n»(0,o - < C2^qSIq{i + mf, (11-4) 

m^x{\\F{9,m, mF){9,m} < C2,q(1 + 1'^"^' 
\\F{9',O-F{e",O\\<C2,Q\\0'~9"\\{l + 



for each 6*, 6*', 6*" G Q, e R'^f. Combining (flTT]) . (fTO)) . (fTOll with the 

dominated convergence theorem, we get h{-) = /(■), g(-) ~ V/(-). On the other side, 
owing to the fact that {a;„}ri>o is a geometrically ergodic Markov chain, we have 
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that {yn}n>o admits a stationary regime for n —>■ oo. Consequently, Theorem 16.11 
implies that /(■) is analytic on Q. Then, the theorem's assertion directly follows from 
Theorem O □ 

Appendix. In this section, we study certain aspects of Assumption 12.31 More 
specifically, we show that Assumption 12.31 is true if its 'local version', Assumption 
12.31 (below) holds. We also demonstrate that (Lojasiewicz coefficients) Sq^a, I^Q,a 
and Mq^a have 'measurable versions' for which 5, /t and M (defined in Section [J) are 
random variables in probability space {Q,J^,P) (i.e., measurable with respect to J^). 
We study these aspects of Assumption 12.31 under the following condition: 

Assumption 12.31 . There exists an open vicinity U oi S with the following 
property: For any compact set Q C U and any real number a G f{Q), there exist real 
numbers Sq ^ E (0, 1), fi'g ^ £ (1, 2], Mq^^ S [1, oo) such that 

|/(0)-a|<Af^.J|V/(0)r«- 

for alie eQ satisfying \ f{9) - a\ < 

Throughout this section, we rely on the following notation, e £ (0, 1) is a fixed 
constant. For a compact set Q C M.'^" , a G f{Q) and S G (0, 1), let 

0Q,,(<5) = sup |i, /"g^i^Jg^^^^fi ■.eeQ\S,Q< \f{0) - a| < j| , 

while 

Sq^a = sup{e(5 : (5 G {0,l),(j}q^aiS) < 1} 

and /XQ,o = l/(/>Q,o(^Q,a), Mq^a = 1- 

Lemma A.l. Let Assumvtion \2.3\ hold. Moreover, let Q C M.'^" be an arbitrary 
compact set, while a G f{Q) is an arbitrary real number. Then, 5q^a, lJ'-q,a, Mq,a 
■specified in this section satisfy all requirements of Assumvtion lKM 

Proof. First, we show 6q^a > 0. To do so, we consider separately the following 
cases: 

Case Qn5 = 0.- Let 

Sq^a = inf {cxp(-2| log || V/(0)|| | ) : G Q} . 
Obviously, < Sq^a < 1 (notice that infeeg ||V/(0)|| > 0). We also have 

2llog||V/(0)||l<log(l/5Q,a) (Al) 

for all 6* G Q. Consequently, 

log||V/WII 

\og\f{e)-a 

for any 9 £ Q satisfying < \f{9) — a| < Sq_a. Thus, (f>q,a{S) < 1/2 for each 
5 £ {0,Sq,a], and hence, 6q,a > eSq^a > 0. 
Case QnS fiQnS): Let 

d'Q^^ = hnf{l,\f{e)-a\:9£QnS}, 

- inf {exp(-2| log \\Vf{e)\\ |) : G Q, \f{e) - a\ < , 



< \^2imm < 1/2 (A.2) 

- \0R(1/S0.a) ~ 
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while Sq^a = ™n{'^Q,Q; ^q,q}- Obviously, < Sq a < 1/2 (notice that < S'q^^ < 1/2 
and that 61 ^ QnS if\f{e)-a\ < (5^^; also notice that < inf{|| V/(0)|| :9eQ, \fiO)- 
a\ < S'qJ). Moreover, (jA.ip holds for all 6* e Q satisfying < \f{9) - a\ < Sg^a- 
Then, (|X2|l is true for any 6* G Q fulfilling < \ f{0)-a\ < Sg^a- Hence, 0Q,a('5) < 1/2 
for all d G (0, (5q_o], and consequently, Sg^a > £<^Q,a > 0. 

Case QnS ^ 9, a e f{Qr\S): Let pg = d{Q n S, U'')/2 and Q = {6 e R"^" : 
die,Qn S) < pg}, while ~S'g,^ = <5^_^, Aq,. = m;^^,, Mq,. = Af^_^ (<5;^^^, m;^,,, 
are introduced in Assumption 12.31 ). Moreover, let 

,5^^, = inf |i exp(-2| log || V/(0)1| |) : ^ G Q \ q| 

and Sg,a = min{5^^,,5^_„,M-'/(''« "-')}. Obviously, Q C f/ and < Sg^a < 1/2. 

Moreover, (|A.1|) is true for all 6* G Q \ Q. Therefore, (|A.2p holds for all 6* G Q \ Q 
satisfying < 1/(6*) — a\ < Sg^a- On the other side. Assumption 12.31 implies 

log |/(^) - a| < log Mg^a + ~^Q,a log ||V/(0)|| 

for all G Q \ 5 satisfying < \f{0) — a| < 5g^a (notice that 5g^a < 5'^ J- Conse- 
quently, 

log|lV/(0)|l ^_ 1 logMg.a \ 



\og\f{e)^a\-fiQ.a\ \og\f{6)~a\^ 



MQ.a \^ \0g{l/ 5g^a) J 

< 1 (A.3) 

2MQ,a 

for all 6* G Q \ S satisfying < 1/(6*) - a| < Sg^a (notice that log(l/^Q,a) > 
2 log MQ_a/(AQ, a - 1))- Thus, as a result of (|A.2|) . (|A.3|) . we have (f)g.a{S) < 1 for 
all S G {0,Sg a], and consequently, (5q ^ > eSg^a > 0. 

Now, we prove that 6g,a, p-g.a, Mg^a fulfill all other requirements of Assumption 
121 By the definition of (/)Q,a(-) and Sg,a, we have < Sg^a < 1, 1/2 < <l>g.a{Sg.,a) < 1 
and 

lOg||V/(0)|| . 

< <PQ.a{(>Q.a) 



\og\f{e)-a\ 



for all e G Q\S satisfying < \ f{9)-a\ < Sg^a- Therefore, 1 < fig^a l/(f>QA^Q,a) < 
2 and 

MQ,a log II V/(0)|| = > log 1/(0) -a| 

for each 6 e Q\S fulfilling < 1/(6*) - a| < Sg^a- Hence, jl^) holds for all 6* G Q 
satisfying < \ f{e) - a\ < Sg^a- 

Lemma A. 2. Let S, fi, M be defined using \2.4\ l, IKM '"^'^ ^Q,a, fJ'Q,a, Mg^a 
specified in this section. Then, S, ji, M are random variables in probability space 

{n,T,p). 
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Proof. For d eR'^", S E (0, 1), let 

Ho, S) = Is^iO) 7(0,,] (\fie) f\) /[o,,] (liminf \\9 - 0„||) 

log 1/(0) -/I V / / 

(p is specified in the definition of Q, Section [J), while 

$(5) = sup { 1/2, $(61, S):0eR'^<'} I a 

(A is defined in Section[7]). Obviously, $(6',(5) and (t){5) are measurable random func- 
tions of {9,5) and 5 (i.e., $(6', (5) and (/)((5) are measurable with respect to cr-algebras 
BiW^o) X S((0, 1)) X T and B((0, 1)) x T). On the other side, it is straightforward to 
verify that 

5 = sup{£j : 5 e (O,l),0((5) < 1} 

and fj- — I /(j^i^) on A- Then, it is clear that 6, fi, M arc random variables in probability 
space {n,T,P). 
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